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DETAILED ACTION 



Claim Rejections - 35 USC § 103 

1 . The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

2. Claims 1-6, 10-16 and 20-21 are rejected under 35 U.S.C. 103(a) as being . 
unpatentable over Hommersom et al. (US 6,134,565) in view of Cohen (US 
5,752,051). 

Regarding claim 1 , Hommersom discloses a document processing for 
extracting a text block from a document (Hommersom in col. 3, lines 1-5, states FIG. 2 
illustrates an apparatus for recognizing separate article in a source document. 
Hommersom, in col. 3, lines 51-60, states "clustering of adjoining information carrying 
pixels are sought in the image and are characterized as characters, line, graphic or 
photo and in the second step information of the character type is divided into text block, 
lines and words" and in col. 4, lines 12-15, Hommersome states "segmentation result is 
filtered so that only object of the type text block, title, graphic and lines are retained". 
In the system of Hommersome the step of segmentation and filtering in which object of 
the type text block, title, graphics, lines retained corresponds to a document processing 
for extracting a text block from a document) comprising the steps: 
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generating an objects including characters, marks and other symbol from the 
document (Hommersom, in col. 3, lines 51-60, states "clustering of adjoining information 
carrying pixels are sought in the image and are characterized as characters, line, 
graphic or photo and in the second step information of the character type is divided into 
text block, lines and words" and in col. 4, lines 12-15, Hommersome states 
"segmentation result is filtered so that only object of the type text block, title, graphic 
and lines are retained". In the system of Hommersome, retaining object type of text 
blocks, lines and graphics coresponds to generating and extracting an objects 
including characters [text blocks] , marks [lines] and other symbol [graphics] from the 
document) ; 

generating a connection candidate between the objects (Hommersom in col. 4, 
lines 12-15, Hommersome states "segmentation result is filtered so that only object of 
the type text block, title, graphic and lines are retained in the article and Hommersom 
in col. 5,lines states 48-55 states "initially all the objects are designated as an article 
[part of the article] . The operation of the interpreter is now intended to combine objects 
into groups by applying the rule successfully". In the system of Hommersom 
segmentation result and designation of all objects as article and applying rules to 
combine objects corresponds to generating a connection candidate [article as whole ] 
between the objects); and 

evaluating validity of connection candidate using a language model 
(Hommersom in col. 4,lines 23-27, states "the actual analysis of the segmentation 
image take place using interpreter with number of rules based on the conventional lay- 
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out of the document being processed. A set of rules for different documents are stored 
in the memory" and Hommersom in col. 5, lines states 48-55 states "initially all the 
objects are designated as an article. The operation of the interpreter is now intended to 
combine objects into groups by applying the rule successfully. In these condition all the 
objects are analyzed consecutively, each object is tested in relationship with all other 
objects by reference to the rule applied . If the out come of the test is positive, the 
second object is added attached to it". In the system of Hommersom initially all the 
objects are designated as an article and all the objects are analyzed consecutively, 
each object is tested in relationship with all other objects by reference to the rule applied 
If the out come of the test is positive, the second object is added attached to the first 
object corresponds to evaluating validity of connection candidate using a language 
model and interpreter which intended to combine objects into groups by applying the 
rule corresponds to language model). 

Hommersome however has not explicitly shown characters are laid out using 
blank characters and generating blank characters. 

In the same field of endeavor Cohen discloses characters are laid out using 
blank characters and generating blank characters (Cohen in col. 3, lines 43-53, states 
the sample text is filtered to remove unwanted characters. Typically punctuation and 
numerals are replaced by the stop characters flanked by blanks. Formatting codes 
such as carriage returns replaced by blanks" and Cohen in col. 7, lines 35-40, the 
logical steps are implemented using hardware and software [computer and computer 
program]. In the system of Cohen punctuation and numerals are replaced by the stop 
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characters and flanked by blanks and formatting codes such as carriage returns is 
replaced by blanks, corresponds to characters are laid out using blank characters and 
generating blank characters). 

Therefore it would have been obvious to one having ordinary skill in the art at 
the time the invention was made to lay out characters using blank characters and 
generating blank characters as shown by Cohen in the system of Hommersome by 
filtering the segmented objects (Hommersom col. 4, lines 12-15) to replace unwanted 
characters with the stop characters spaces flanked by blanks and replacing formatting 
codes such as carriage returns by blanks in the system of Hommersome because 
such process provide necessary step for connecting words and phrases in the 
document (as stated by Cohen in col. 5, lines 35-58, A word is a string of consecutive 
symbols which is separated from adjoining symbols by "spaces" . Two adjacent words 
are joined together as phrase, including the common delimiter [blank space] if symbol 
on either side of common delimiter [blank space] jointly contribute to significantly 
scoring of n-gram without requiring training) thereby providing efficient/faster 
processing for connecting or combining objects in the document as stated by Cohen in 
col. 7, lines 22-30. 

Regarding claim 2, Hommersom discloses determining if a connection is valid 
(Hommersom in col. 3, lines 1-5, states FIG. 2 illustrates an apparatus for recognizing 
separate article in a source document. Hommersom in col. 5,lines states 48-55 states 
"initially all the objects are designated as an article. The operation of the interpreter is 
now intended to combine objects into groups by applying the rule successfully. In these 
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condition all the objects are analyzed consecutively, each object is tested in relationship 
with all other objects by reference to the rule applied . If the out conne of the test is 
positive, the second object is added attached to if. In the system of Hommersome if the 
out come of the test is positive, the second object is added attached to it. corresponds 
determining if a connection is valid); and 

combining the objects corresponding to source and destination of connection 
candidate, if it is determined that the connection of candidate is valid (Hommersom in 
col. 5,lines states 52-55 states "In these condition all the objects are analyzed 
consecutively, each object is tested in relationship with all other objects by reference to 
the rule applied . If the out come of the test is positive, the second object is added 
attached to it", which corresponds to combining the objects corresponding to source 
[first object] and destination of connection candidate [second object], if it is determined 
that the connection of candidate is valid). 

Regarding claim 3, Hommersom discloses the object generated is associated 
with a coordinate indicating a position of the document (Hommersome in col. 4, lines 5- 
10 and 16-20, states "All objects in the image are now known and their position are 
fixed in the form of coordinates e.g. top left hand corner and bottom right hand corner 
and step 3 comprises determining position features for all remaining obects", which 
corresponds to the object generated is associated with a coordinate indicating a position 
of the documen) . 

Regarding claim 4, Hommersom discloses text block is generated by combinig 
the objects (Hommersom, in. col. 3. lines 51-60, states "clustering of adjoining 
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information carrying pixels are sought in the image and are characterized as characters, 
line, graphic or photo and in the second step information of the character type is 
divided into text block, lines and words" . In the system of Hommersom character type is 
divided into text block [characters are combined as text block], lines and words 
[characters are combine] corresponds to text block is generated by combining the 
objects), 

text block is defined as a rectangular region with a minimum area that 
includes the objects (Hommersom in figure 1 shows text block is defined as a 
rectangular region, in col. 4, lines 5-10, Hommersom states "all the objects position is 
fixed in the form of coordinates e.g. top left hand corner and the bottom right-hand 
corner of each object", since all all the objects position is fixed in the form of 
coordinates e.g. top left hand corner and the bottom right-hand corner of each object 
therefore it is obvious that text block is defined as a rectangular region [figure 1] with a 
minimum area that includes the objects because text block area is defined in term of its 
coordinates). 

the position of f the text block is specified using the coordinates of opposing 
corners of rectangular region in the document (Hommersom in figure 1 shows text block 
is defined as a rectangular region, in col. 4, lines 5-10, Hommersom states "all the 
objects position is fixed in the form of coordinates e.g. top left hand corner and the 
bottom right-hand corner of each object" which corresponds to text block is specified 
using the coordinates of opposing corners of rectangular region in the document). 
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Regarding claim 5, Hommersom discloses connection candidate between the 
objects is a connection with an object that adjoins the source object on the right side or 
a connection with an object that is located in the next line and the left side of the source 
object (Homnnersome in col. 6, lines 38-40, states a text block or title flanked on both 
sides by objects belonging to one and same article is added to the article" which 
corresponds to connection candidate between the objects is a connection with an object 
that adjoins the source object [article] on the right side). 

Regarding claim 6, Hommersom shows language model (Hommersom in col. 
4,lines 23-27, states "the actual analysis of the segmentation image take place using 
interpreter with number of rules based on the conventional lay-out of the document 
being processed. A set of rules for different documents are stored in the memory". This 
corresponds to rule based language model). 

Hommersom however has not disclosed the language model is an N-gram. 

In the same field of endeavor Cohen shows the language model is N-gram 
(Cohen in col. 3, lines 60-65, thru col. 4, lines 1-6 states "operating on the filtered 
sample text, step 12 forms sample N-gram counts as follows: let the filtered sample text 
be of length S with symbol si , s2, sn. Fixing the positive integer n, defining the jth N- 
gram gj as the N-long subsequence of the text centered about jth symbol: In other 
words an N-long widow is slid along the text and pattern at each position of window is 
noted" and Hommersom in col. 5, lines 47-58, "A word is recognized as significant if at 
least one of its symbol has score equal to or exceeding the symbol score. Similarly if a 
significant N-gram span two words then the combination of two words is significant . 
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Any number of consecutive words be joined in this fashion. Cohen in col. 7, lines 22-26, 
states The master N-gram scores are derived from document, but are nonzero if they 
are judged to be significant. As more N-grams are removed from candidacy the 
processing becomes faster. In the system of Hommerson all this corresponds to a 
language model is N gram because based on the N-gram score of text window, words 
are recognized and the consecutive words are combined to form phrases and 
sentences therefore parts of document/article are connected). 

Therefore it would have been obvious to one having ordinary skill in the art at 
the time the invention was made to use the N-gram language model in the system of 
Hommersom by replacing rule based language model of Hommersom with N-gram 
model by determining symbol scores of the segmented filtered (words, text blocks and 
title) in the system of Hommersome (Hommersom col. 4, lines 11-15) because such a 
process would verify the segmentation result [word, sentence , title [text blocks] of 
Hommersom, connect the portion of document/article with higher processing speed (as 
stated by Cohen col. 7, lines 22-26) and thereby provide faster speed of recognizing/ 
sorting documents/articles (as stated by Cohen in col. 2, lines 20-25). 

Regarding claim 10, Hommersom discloses if there is only a single connection 
candidate between the objects, the initial text blocks, or the text blocks combined 
thereof, combining them without determining validity of connection using a language 
model (Hommersom, in col. 3, lines 52-65, states "clusters of adjoining information 
carrying pixels is sought in the image and are characterized as line, graphic, or photo. 
In addition additions characters larger than the average size of the characters are 
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characterized further as large characters, In the second step the Image information of 
the character type is divided into text blocl<s lines word. The segmentation result is 
expanded with objects title (a text block or line formed of large characters). In the 

system of Hommersom a text block or line formed [title] of large characters corresponds 
to the initial text blocks, combining them without determining validity of connection 
using a language model [combining them during segmentation without applying the 
rules] shown by Hommersome in col. 6, lines 26-55 based on the size of the 
characters corresponds to single connection candidate). 

Regarding claim 1 1 , Hommersom discloses a document processing system for 
extracting a text block from a document (Hommersom in col. 3, lines 1-5. states FIG. 2 
illustrates an apparatus for recognizing separate articles in a source document. 
Hommersom, in col. 3, lines 51-60, states "clustering of adjoining information carrying 
pixels are sought in the image and are characterized as characters, line, graphic or 
photo and in the second step information of the character type is divided into text block, 
lines and words" and in col. 4, lines 12-15, Hommersome states "segmentation result is 
filtered so that only object of the type text block, title, graphic and lines are retained". 
In the system [FIG. 2 of Hommersom] of Hommersome the step of segmentation and 
filtering in which object of the type text block, title, graphics, lines retained corresponds 
to a document processing for extracting a text block from a document and apparatus 2 
shown in Hommersom corresponds to document processing system for extracting a 
text block from a document) comprising the steps: 
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means for generating an objects including characters, marks and other symbol 
from the document (Hommersom, in col. 3, lines 51-60, states "clustering of adjoining 
information carrying pixels are sought in the image and are characterized as characters, 
line, graphic or photo and in the second step information of the character type is 
divided into text block, lines and words", in col. 4, lines 12-15, Hommersome states 
"segmentation result is filtered so that only object of the type text block, title, graphic 
and lines are retained" and Hommersom in col. 3, lines 1-10, states FIG. 2 illustrates an 
apparatus of the invention for recognizing separate article in a source document. This 
apparatus includes a central processing unit and memory disc and in col. 3, lines 20- 
25, "The central processing unit is computer having a program for processing". In the 
system of Hommersome, retaining object type of text blocks, lines and graphics 
corresponds to generating and extracting an objects including characters [text blocks] , 
marks [lines] and other symbol [graphics] from the document and the computer and 
computer program of Hommerson shown in FIG 2 corresponds to means for generating 
an objects including characters, marks and other symbol from the document) ; 

mean for generating a connection candidate between the objects 
(Hommersom in col. 4, lines 12-15, Hommersome states "segmentation result is filtered 
so that only object of the type text block, title, graphic and lines are retained in the 
article and Hommersom in col. 5,lines states 48-55 states "initially all the objects are 
designated as an article [part of the article] . The operation of the interpreter is now 
intended to combine objects into groups by applying the rule successfully" and 
Hommersom in col. 3, lines 1-10, states FIG. 2 illustrates an apparatus of the invention 
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for recognizing separate article in a source document. This apparatus includes a central 
processing unit and mennory disc and in col. 3, lines 20-25, "The central processing unit 
is computer having a program for processing". In the system of Hommersom 
segmentation result and designation of all objects as article and applying rules to 
combine objects corresponds to generating a connection candidate [article as whole ] 
between the objects and the computer and computer program corresponds to means for 
generating a connection candidate between the objects); and 

means for evaluating validity of connection candidate using a language model 
(Hommersom in col. 4, lines 23-27, states "the actual analysis of the segmentation 
image take place using interpreter with number of rules based on the conventional lay- 
out of the document being processed. A set of rules for different documents are stored 
in the memory" and Hommersom in col. 5, lines states 48-55 states "initially all the 
objects are designated as an article. The operation of the interpreter is now intended to 
combine objects into groups by applying the rule successfully. In these condition all the 
objects are analyzed consecutively, each object is tested in relationship with all other 
objects by reference to the rule applied . If the out come of the test is positive, the 
second object is added attached to it", and Hommersom in col. 3, lines 1-10, states 
FIG. 2 illustrates an apparatus of the invention for recognizing separate article in a 
source document. This apparatus includes a central processing unit and memory disc 
and in col. 3 lines 20-25, "The central processing unit is computer having a program for 
processing". In the system of Hommersom initially all the objects are designated as an 
article and all the objects are analyzed consecutively, each object is tested in 
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relationship with all other objects by reference to the mle applied If the out conne of the 
test is positive, the second object is added attached to the first object corresponds to 
evaluating validity of connection candidate using a language nriodel and interpreter 
which intended to combine objects into groups by applying the rule corresponds to 
language model and the computer and computer program of Hommersom of FIG 2 
corresponds to means for evaluating validity of connection candidate using a language 
model). 

^ Hommersome however has not explicitly shown characters are laid out using 
blank characters and means for generating blank characters. 

In the same field of endeavor Cohen discloses characters are laid out using 
blank characters and generating blank characters (Cohen in col. 3, lines 43-53, states 
the sample text is filtered to remove unwanted characters. Typically punctuation and 
numerals are replaced by the stop characters flanked by blanks. Formatting codes 
such as carriage returns replaced by blanks" and Cohen in col. 7, lines 35-40, the 
logical steps are implemented using hardware and software [computer and computer 
program]. In the system of Cohen punctuation and numerals are replaced by the stop 
characters and flanked by blanks and formatting codes such as carriage returns is 
replaced by blanks, corresponds to characters are laid out using blank characters and 
generating blank characters and the implementation of steps using hardware/computer 
and software/computer program corresponds to means for generating blank 
characters). 
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Therefore it would have been obvious to one having ordinary skill in the art at 
the time the invention was made to lay out characters using blank characters and 
generating blank characters as shown by Cohen in the system of Hommersome by 
filtering the segmented objects (Hommersom col. 4, lines 12-15) to replace unwanted 
characters with the stop characters spaces flanked by blanks and replacing formatting 
codes such as carriage returns by blanks in the system of Hommersome because 
such process provide necessary step for connecting words and phrases in the 
document (as stated by Cohen in col. 5, lines 35-58, A word is a string of consecutive 
symbols which is separated from adjoining symbols by "spaces" . Two adjacent words 
are joined together as phrase, including the common delimiter [blank space] if symbol 
on either side of common delimiter [blank space] jointly contribute to significantly 
scoring of n-gram without requiring training) thereby providing efficient/faster 
processing for connecting or combining objects in the document as stated by Cohen in 
col. 7, lines 22-30 without requiring training and rules. 

Regarding claim 12, Hommersom discloses means for combining the objects 
corresponding to source and destination of connection candidate, if it is determined that 
the connection of candidate is valid (Hommersom in col. 3, lines 1-5, states FIG. 2 
illustrates an apparatus for recognizing separate article in a source document. 
Hommersom in col. 5,lines states 48-55 states "initially all the objects are designated as 
an article. The operation of the interpreter is now intended to combine objects into 
groups by applying the rule successfully. In these condition all the objects are analyzed 
consecutively, each object is tested in relationship with all other objects by reference to 
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the rule applied . If the out come of the test Is positive, the second object is added 
attached to it". In the system of Hommersome if the out come of the test is positive, the 
second object is added attached to it, corresponds determining if a connection is valid, 
Hommersom in col. S.lines states 52-55 states "In these condition all the objects are 
analyzed consecutively, each object is tested in relationship with all other objects by 
reference to the rule applied . If the out come of the test is positive, the second object is 
added attached to it", which con^esponds to combining the objects corresponding to 
source [first object] and destination of connection candidate [second object], if it is 
determined that the connection of candidate is valid and the system of Hommersom 
which includes computer and computer program shown in figure 2 corresponds to 
means for combining the objects corresponding to source and destination of connection 
candidate, if it is determined that the connection of candidate is valid). 

Regarding claim 13, Hommersom discloses the means for associating a 
generated object with a coordinate indicating a position of the document (Hommersome 
in col. 4, lines 5-10 and 16-20, states "All objects in the image are now known and their 
position are fixed in the form of coordinates e.g. top left hand corner and bottom right 
hand comer and step 3 comprises determining position features for all remaining 
obects", which corresponds to the object generated is associated with a coordinate 
indicating a position of the document and the system of Hommersom which includes 
computer and computer program shown in figure 2 corresponds to means for 
associating a generated object with a coordinate indicating a position of the document) . 
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Regarding claim 14, Hommersom discloses means for generating text block 
by combinig the objects (Hommersom, in col. 3, lines 51-60, states "clustering of 
adjoining information canying pixels are sought in the image and are characterized as 
characters, line, graphic or photo and in the second step information of the character 
type is divided into text block, lines and words" . In the system of Hommersom character 
type is divided into text block [character are combined as text block], lines and words 
corresponds to text block is generated by combining the objects and the system of 
Hommersom which includes computer and computer program shown in figure 2 
corresponds to means for generating text block by combinig the objects), 

text block is defined as a rectangular region with a minimum area that 
includes the objects (Hommersom in figure 1 shows text block is defined as a 
rectangular region, in col. 4, lines 5-10, Hommersom states "all the objects position is 
fixed in the form of coordinates e.g. top left hand corner and the bottom right-hand 
corner of each object", since all the objects position is fixed in the form of coordinates 
e.g. top left hand corner and the bottom right-hand corner of each object therefore it is 
obvious that text block is defined as a rectangular region [figure 1] with a minimum area 
that includes the objects because text block area is defined in term of its coordinates). 

the position of f the text block is specified using the coordinates of opposing 
corners of rectangular region in the document (Hommersom in figure 1 shows text block 
is defined as a rectangular region, in col. 4, lines 5-10, Hommersom states "all the 
objects position is fixed in the form of coordinates e.g. top left hand corner and the 
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bottom right-hand corner of each object" which corresponds to text block is specified 
using the coordinates of opposing corners of rectangular region in the document). 

Regarding claim 15, Hommersom discloses connection candidate between the 
objects is a connection with an object that adjoins the source object on the right side or 
a connection with an object that is located in the next line and the left side of the source 
object (Hommersome in col. 6, lines 38-40, states a text block or title flanked on both 
sides by objects belonging to one and same article is added to the article" which 
corresponds to connection candidate between the objects is a connection with an object 
that adjoins the source object [article] on the right side). 

Regarding claim 16, Hommersom shows language model (Hommersom in col. 
4, lines 23-27, states "the actual analysis of the segmentation image take place using 
interpreter with number of rules based on the conventional lay-out of the document 
being processed. A set of rules for different documents are stored in the memory". This 
corresponds to rule based language model). 

Hommersom however has not disclosed the language model is an N-gram. 

In the same field of endeavor Cohen shows the language model is N-gram 
(Cohen in col. 3, lines 60-65, thru col. 4, lines 1-6 states "operating on the filtered 
sample text, step 12 forms sample N-gram counts as follows: let the filtered sample text 
be of length S with symbol si , s2, sn. Fixing the positive integer n, defining the jth N- 
gram gj as the N-long subsequence of the text centered about jth symbol: In other 
words an N-long widow is slid along the text and pattern at each position of window is 
noted" and Hommersom in col. 5, lines 47-58, "A word is recognized as significant if at 
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least one of its symbol has score equal to or exceeding the symbol score. Similarly if a 
significant N-gram span two words then the combination of two words is significant . 
Any number of consecutive words be joined in this fashion. Cohen in col. 7, lines 22-26, 
states "The master N-gram scores are derived from document, but are nonzero if they 
are judged to be significant. As more N-grams are removed from candidacy the 
processing becomes faster. In the system of Hommerson all this con-esponds to a 
language model is N gram because based on the N-gram score of text window, words 
are recognized and the consecutive words are combined to form phrases and 
sentences therefore parts of document/article are connected). 

Therefore it would have been obvious to one having ordinary skill in the art at 
the time the invention was made to use the N-gram language model in the system of 
Hommersom by replacing rule based language model of Hommersom with N-gram 
model by determining symbol scores of the segmented filtered (words, text blocks and 
title) in the system of Hommersome (Hommersom col. 4, lines 11-15) because such a 
process would verify the segmentation result [word, sentence , title [text blocks] of 
Hommersom, connect the portion of document/article with higher processing speed (as 
stated by Cohen col. 7, 22-26) and thereby provide faster speed of recognizing/ sorting 
documents/articles (as stated by Cohen in col. 2, lines 20-25). 

Regarding claim 20, Hommerrsom discloses if there is only a single connection 
candidate between the objects, the initial text blocks, or the text blocks combined 
thereof, combining them without determining validity of connection using a language 
model (Hommersom, in col. 3, lines 52-65, states "clusters of adjoining information 
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carrying pixels is sought in the innage and are characterized as line, graphic, or photo. 
In addition additions characters larger than the average size of the characters are 
characterized further as large characters, In the second step the innage infornnation of 
the character type is divided into text blocks lines word. The segmentation result is 
expanded with objects title (a text block or line formed of large characters). In the 
system of Hommersom a text block or line formed [title] of large characters corresponds 
to the initial text blocks, combining them without determining validity of connection 
using a language model [combining them during segmentation without applying the 
rules] shown by Hommersome in col. 6, lines 26-55 based on the size of the 
characters corresponds to single connection candidate). 

Regarding claim 21 , Hommersom discloses a computer-readable medium 
having recorded thereon a program for causing computer (Hommersom in col. 3. lines 
1-10, states FIG. 2 illustrates an apparatus of the invention for recognizing separate 
article in a source document. This apparatus includes a central processing unit and 
memory disc and in col. 3, lines 20-25, "The central processing unit is computer having 
a program for processing". This corresponds to computer-readable medium having 
recorded thereon a program for causing computer) 

extracting a text block from a document (Hommersom in col. 3, lines 1-5, 
states FIG. 2 illustrates an apparatus for recognizing separate article in a source 
document. Hommersom, in col. 3, lines 51-60, states "clustering of adjoining 
information carrying pixels are sought in the image and are characterized as characters, 
line, graphic or photo and in the second step information of the character type is 
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divided into text block, lines and words" and in col. 4, lines 12-15, Hommersome states 
"segmentation result is filtered so that only object of the type text block, title, graphic 
and lines are retained". In the system of Hommersome the step of segmentation and 
filtering in which object of the type text block, title, graphics, lines retained corresponds 
to a document processing for extracting a text block from a document) comprising the 
steps: 

generating an objects including characters, marks and other symbol from the 
document (Hommersom, in col. 3, lines 51-60, states "clustering of adjoining information 
carrying pixels are sought in the image and are characterized as characters, line, 
graphic or photo and in the second step information of the character type is divided into 
text block, lines and words" and in col. 4, lines 12-15, Hommersome states 
"segmentation result is filtered so that only object of the type text block, title, graphic 
and lines are retained". In the system of Hommersome, retaining object type of text 
blocks, lines and graphics corresponds to generating and extracting an objects 
including characters [text blocks] , marks [lines] and other symbol [graphics] from the 
document) ; 

generating a connection candidate between the objects (Hommersom in col. 4, 
lines 12-15, Hommersome states "segmentation result is filtered so that only object of 
the type text block, title, graphic and lines are retained in the article and Hommersom 
in col. 5,llnes states 48-55 states "initially all the objects are designated as an article 
[part of the article] . The operation of the interpreter is now intended to combine objects 
into groups by applying the rule successfully". In the system of Hommersom 
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segmentation result and designation of all objects as article and applying rules to 
combine objects corresponds to generating a connection candidate [article as whole ] 
between the objects); and 

evaluating validity of connection candidate using a language model 
(Hommersom in col. 4,lines 23-27, states "the actual analysis of the segmentation 
image take place using interpreter with number of rules based on the conventional lay- 
out of the document being processed. A set of rules for different documents are stored 
in the memory" and Hommersom in col. 5, lines states 48-55 states "initially all the 
objects are designated as an article. The operation of the interpreter is now intended to 
combine objects into groups by applying the rule successfully. In these condition all the 
objects are analyzed consecutively, each object is tested in relationship with all other 
objects by reference to the rule applied . If the out come of the test is positive, the 
second object is added attached to it". In the system of Hommersom initially all the 
objects are designated as an article and all the objects are analyzed consecutively, 
each object is tested in relationship with all other objects by reference to the rule applied 
If the out come of the test is positive, the second object is added attached to the first 
object corresponds to evaluating validity of connection candidate using a language 
model and interpreter which intended to combine objects into groups by applying the 
rule corresponds to language model). 

Hommersome however has not explicitly shown characters are laid out using 
blank characters and generating blank characters. 
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In the same field of endeavor Cohen discloses characters are laid out using 
blank characters and generating blank (Cohen in col. 3, lines 43-53, states the sample 
text is filtered to remove unwanted characters. Typically punctuation and numerals are 
replaced by the stop characters flanked by blanks. Formatting codes such as carriage 
returns replaced by blanks". In the system of Cohen punctuation and numerals are 
replaced by the stop characters and flanked by blanks and formatting codes such as 
carriage returns is replaced by blanks which corresponds to characters are laid out 
using blank characters and generating blank characters). 

Therefore it would have been obvious to one having ordinary skill in the art at 
the time the invention was made to lay out characters using blank characters and 
generating blank characters as shown by Cohen in the system of Hommersome by 
filtering the segmented objects (Hommersom col. 4, lines 12-15) to replace unwanted 
characters with the stop characters spaces flanked by blanks and replacing formatting 
codes such as carriage returns by blanks in the system of Hommersome because 
such process provide necessary step for connecting words and phrases in the 
document (as stated by Cohen in col. 5, lines 35-58, A word is a string of consecutive 
symbols which is separated from adjoining symbols by "spaces" . Two adjacent words 
are joined together as phrase, including the common delimiter [blank space] if symbol 
on either side of common delimiter [blank space] jointly contribute to significantly 
scoring of n-gram without requiring training) thereby providing efficient/faster 
processing for connecting or combining objects in the document as stated by Cohen in 
col. 7, lines 22-30. 
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Allowable Subject Matter 

3. Claims 7-9 and 1 7-1 9 are objected as being dependent rejected base claim but 
would allowable over prior art of record if rewritten in independent form including 
limitations of the base claim and any intervening claim. 

Communication 

4. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Sherali Ishrat whose telephone number is 571-272- 
7398. The examiner can normally be reached on 8:00 AM - 4:30PM. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Joseph Mancuso can be reached on 571-272-7695. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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